Effect of Spam on Hashtag Recommendation for Tweets
نویسندگان
چکیده
Presence of spam tweets in a datasetmay affect the choices of feature selection, algorithm formulation, and system evaluation for many applications. However, most existing studies have not considered the impact of spam tweets. In this paper, we study the impact of spam tweets on hashtag recommendation for hyperlinked tweets (i.e., tweets containing URLs) in HSpam14 dataset. HSpam14 is a collection of 14 million tweets with annotations of being spam and ham (i.e., non-spam). In our experiments, we observe that it is much easier to recommend “correct” hashtags for spam tweets than ham tweets, because of the near duplicates in spam tweets. Simple approaches like recommendingmost popular hashtags achieves very good accuracy on spam tweets. On the other hand, features that are highly effective on ham tweets may not be effective on spam tweets. Our findings suggest that without removing spam tweets from the data collection (as in most studies), the results obtained could be misleading for hashtag recommendation tasks.
منابع مشابه
An analysis of 14 Million tweets on hashtag-oriented spamming
Over the years, Twitter has become a popular platform for information dissemination and information gathering. However, the popularity of Twitter has attracted not only legitimate users but also spammers who exploit social graphs, popular keywords, and hashtags for malicious purposes. In this paper, we present a detailed analysis of the HSpam14 dataset, which contains 14 million tweets with spa...
متن کاملImpact of Feature Selection on Micro-Text Classification
Social media datasets – especially Twier tweets – are popular in the eld of text classication. Tweets are a valuable source of microtext (sometimes referred to as “micro-blogs”), and have been studied in domains such as sentiment analysis, recommendation systems, spam detection, clustering, among others [6]. Tweets oen include keywords referred to as “Hashtags” that can be used as labels fo...
متن کاملUne méthode collaborative pour identifier les spams: contribution à la qualité de l'information dans les réseaux sociaux
Prevent the actions of malicious users called "spammers" is a real challenge to maintain a high level of performance in applications implemented in social networks. Conventional spam detection methods impose large and unavoidable processing times, for example up to months for processing large collections of tweets. These methods entirely dependent on the supervised learning approach chosen to p...
متن کاملPersonalized Hashtag Suggestion for Microblogs
In microblogging services, users can generate hashtags to categorize their tweets. However, a majority of microblogs do not contain hashtags, which has intrigued active research on the problem of automatic hashtag recommendation for microblogs. Previous work conducted on this problem mostly does not take the user’s preference into consideration. In this paper, we propose a novel personalized ha...
متن کاملAutomatic Hashtag Recommendation in Social Networking and Microblogging Platforms Using a Knowledge-Intensive Content-based Approach
In social networking/microblogging environments, #tag is often used for categorizing messages and marking their key points. Also, since some social networks such as twitter apply restrictions on the number of characters in messages, #tags can serve as a useful tool for helping users express their messages. In this paper, a new knowledge-intensive content-based #tag recommendation system is intr...
متن کامل